Chapter 8 - Exponential smoothing
R. Hyndman/R. J. Serrano
7/30/2022
Exponential smoothing
Historical perspective
- Developed in the 1950s and 1960s as methods (algorithms) to produce point forecasts.
- Combine a “level”, “trend” (slope) and “seasonal” component to describe a time series.
- The rate of change of the components are controlled by “smoothing parameters”: \(\alpha\), \(\beta\) and \(\gamma\) respectively.
- Need to choose best values for the smoothing parameters (and initial states).
- Equivalent ETS state space models developed in the 1990s and 2000s.
Big idea: control the rate of change
\(\alpha\) controls the flexibility of the level
- If \(\alpha = 0\), the level never updates (mean)
- If \(\alpha = 1\), the level updates completely (naive)
\(\beta\) controls the flexibility of the trend
- If \(\beta = 0\), the trend is linear
- If \(\beta = 1\), the trend changes suddenly every observation
\(\gamma\) controls the flexibility of the seasonality
- If \(\gamma = 0\), the seasonality is fixed (seasonal means)
- If \(\gamma = 1\), the seasonality updates completely (seasonal naive)
A model for levels, trends, and seasonalities
We want a model that captures the level (\(\ell_t\)), trend (\(b_t\)) and seasonality (\(s_t\)).
ETS models
Additive ("A") or multiplicative ("M")
None ("N"), additive ("A"), multiplicative ("M"), or damped ("Ad" or "Md").
None ("N"), additive ("A") or multiplicative ("M")
Simple exponential smoothing
Simple methods
Time series \(y_1,y_2,\dots,y_T\).
- Want something in between these methods.
- Most recent data should have more weight.
Simple Exponential Smoothing
Simple Exponential Smoothing
Simple Exponential Smoothing
- \(\ell_t\) is the level (or the smoothed value) of the series at time t.
- \(\pred{y}{t+1}{t} = \alpha y_t + (1-\alpha) \pred{y}{t}{t-1}\)
Iterate to get exponentially weighted moving average form.
Optimising smoothing parameters
- Need to choose best values for \(\alpha\) and \(\ell_0\).
- Similarly to regression, choose optimal parameters by minimising SSE: \[
\text{SSE}=\sum_{t=1}^T(y_t - \pred{y}{t}{t-1})^2.
\]
- Unlike regression there is no closed form solution — use numerical optimization.
- For Algerian Exports example:
- \(\hat\alpha = 0.8400\)
- \(\hat\ell_0 = 39.54\)
Simple Exponential Smoothing

Models and methods
Methods
- Algorithms that return point forecasts.
Models
- Generate same point forecasts but can also generate forecast distributions.
- A stochastic (or random) data generating process that can generate an entire forecast distribution.
- Allow for “proper” model selection.
ETS(A,N,N): SES with additive errors
Forecast error:
\(e_t = y_t - \pred{y}{t}{t-1} = y_t - \ell_{t-1}\).
Specify probability distribution for \(e_t\), we assume \(e_t = \varepsilon_t\sim\text{NID}(0,\sigma^2)\).
ETS(A,N,N): SES with additive errors
where \(\varepsilon_t\sim\text{NID}(0,\sigma^2)\).
- “innovations” or “single source of error” because equations have the same error process, \(\varepsilon_t\).
- Measurement equation: relationship between observations and states.
- State equation(s): evolution of the state(s) through time.
ETS(M,N,N): SES with multiplicative errors.
- Specify relative errors \(\varepsilon_t=\frac{y_t-\pred{y}{t}{t-1}}{\pred{y}{t}{t-1}}\sim \text{NID}(0,\sigma^2)\)
- Substituting \(\pred{y}{t}{t-1}=\ell_{t-1}\) gives:
- \(y_t = \ell_{t-1}+\ell_{t-1}\varepsilon_t\)
- \(e_t = y_t - \pred{y}{t}{t-1} = \ell_{t-1}\varepsilon_t\)
- Models with additive and multiplicative errors with the same parameters generate the same point forecasts but different prediction intervals.
ETS(A,N,N): Specifying the model
ETS(y ~ error("A") + trend("N") + season("N"))
By default, an optimal value for \(\alpha\) and \(\ell_0\) is used.
\(\alpha\) can be chosen manually in trend().
trend("N", alpha = 0.5)
trend("N", alpha_range = c(0.2, 0.8))
Example: Algerian Exports
algeria_economy <- global_economy %>%
filter(Country == "Algeria")
fit <- algeria_economy %>%
model(ANN = ETS(Exports ~ error("A") + trend("N") + season("N")))
report(fit)
## Series: Exports
## Model: ETS(A,N,N)
## Smoothing parameters:
## alpha = 0.84
##
## Initial states:
## l[0]
## 39.5
##
## sigma^2: 35.6
##
## AIC AICc BIC
## 447 447 453
Example: Algerian Exports
components(fit) %>% autoplot()

Example: Algerian Exports
components(fit) %>%
left_join(fitted(fit), by = c("Country", ".model", "Year"))
Example: Algerian Exports
fit %>%
forecast(h = 5) %>%
autoplot(algeria_economy) +
labs(y = "% of GDP", title = "Exports: Algeria")

Models with trend
Holt’s linear trend
- Two smoothing parameters \(\alpha\) and \(\beta^*\) (\(0\le\alpha,\beta^*\le1\)).
- \(\ell_t\) level: weighted average between \(y_t\) and one-step ahead forecast for time \(t\), \((\ell_{t-1} + b_{t-1}=\pred{y}{t}{t-1})\)
- \(b_t\) slope: weighted average of \((\ell_{t} - \ell_{t-1})\) and \(b_{t-1}\), current and previous estimate of slope.
- Choose \(\alpha, \beta^*, \ell_0, b_0\) to minimise SSE.
ETS(A,A,N)
Holt’s linear method with additive errors.
- Assume \(\varepsilon_t=y_t-\ell_{t-1}-b_{t-1} \sim \text{NID}(0,\sigma^2)\).
- Substituting into the error correction equations for Holt’s linear method \[\begin{align*}
y_t&=\ell_{t-1}+b_{t-1}+\varepsilon_t\\
\ell_t&=\ell_{t-1}+b_{t-1}+\alpha \varepsilon_t\\
b_t&=b_{t-1}+\alpha\beta^* \varepsilon_t
\end{align*}\]
- For simplicity, set \(\beta=\alpha \beta^*\).
Exponential smoothing: trend/slope
ETS(M,A,N)
Holt’s linear method with multiplicative errors.
- Assume \(\varepsilon_t=\frac{y_t-(\ell_{t-1}+b_{t-1})}{(\ell_{t-1}+b_{t-1})}\)
- Following a similar approach as above, the innovations state space model underlying Holt’s linear method with multiplicative errors is specified as \[\begin{align*}
y_t&=(\ell_{t-1}+b_{t-1})(1+\varepsilon_t)\\
\ell_t&=(\ell_{t-1}+b_{t-1})(1+\alpha \varepsilon_t)\\
b_t&=b_{t-1}+\beta(\ell_{t-1}+b_{t-1}) \varepsilon_t
\end{align*}\] where again \(\beta=\alpha \beta^*\) and \(\varepsilon_t \sim \text{NID}(0,\sigma^2)\).
ETS(A,A,N): Specifying the model
ETS(y ~ error("A") + trend("A") + season("N"))
By default, optimal values for \(\beta\) and \(b_0\) are used.
\(\beta\) can be chosen manually in trend().
trend("A", beta = 0.004)
trend("A", beta_range = c(0, 0.1))
Example: Australian population
aus_economy <- global_economy %>% filter(Code == "AUS") %>%
mutate(Pop = Population / 1e6)
fit <- aus_economy %>%
model(AAN = ETS(Pop ~ error("A") + trend("A") + season("N")))
report(fit)
## Series: Pop
## Model: ETS(A,A,N)
## Smoothing parameters:
## alpha = 1
## beta = 0.327
##
## Initial states:
## l[0] b[0]
## 10.1 0.222
##
## sigma^2: 0.0041
##
## AIC AICc BIC
## -77.0 -75.8 -66.7
Example: Australian population
components(fit) %>% autoplot()

Example: Australian population
components(fit) %>%
left_join(fitted(fit), by = c("Country", ".model", "Year"))
Example: Australian population
fit %>%
forecast(h = 10) %>%
autoplot(aus_economy) +
labs(y = "Millions", title = "Population: Australia")

Damped trend method
- Damping parameter \(0<\phi<1\).
- If \(\phi=1\), identical to Holt’s linear trend.
- As \(h\rightarrow\infty\), \(\pred{y}{T+h}{T}\rightarrow \ell_T+\phi b_T/(1-\phi)\).
- Short-run forecasts trended, long-run forecasts constant.
Example: Australian population
aus_economy %>%
model(holt = ETS(Pop ~ error("A") + trend("Ad") + season("N"))) %>%
forecast(h = 20) %>%
autoplot(aus_economy)

Example: Australian population
fit <- aus_economy %>%
filter(Year <= 2010) %>%
model(
ses = ETS(Pop ~ error("A") + trend("N") + season("N")),
holt = ETS(Pop ~ error("A") + trend("A") + season("N")),
damped = ETS(Pop ~ error("A") + trend("Ad") + season("N"))
)
Example: Australian population
| \(\alpha\) |
1.00 |
1.00 |
1.00 |
| \(\beta^*\) |
|
0.30 |
0.40 |
| \(\phi\) |
|
|
0.98 |
| NA |
|
0.22 |
0.25 |
| NA |
10.28 |
10.05 |
10.04 |
| Training RMSE |
0.24 |
0.06 |
0.07 |
| Test RMSE |
1.63 |
0.15 |
0.21 |
| Test MASE |
6.18 |
0.55 |
0.75 |
| Test MAPE |
6.09 |
0.55 |
0.74 |
| Test MAE |
1.45 |
0.13 |
0.18 |
Models with seasonality
Holt-Winters additive method
Holt and Winters extended Holt’s method to capture seasonality.
- \(k=\) integer part of \((h-1)/m\). Ensures estimates from the final year are used for forecasting.
- Parameters: \(0\le \alpha\le 1\), \(0\le \beta^*\le 1\), \(0\le \gamma\le 1-\alpha\) and \(m=\) period of seasonality (e.g. \(m=4\) for quarterly data).
Holt-Winters additive method
- Seasonal component is usually expressed as \(s_{t} = \gamma^* (y_{t}-\ell_{t})+ (1-\gamma^*)s_{t-m}.\)
- Substitute in for \(\ell_t\): \(s_{t} = \gamma^*(1-\alpha) (y_{t}-\ell_{t-1}-b_{t-1})+ [1-\gamma^*(1-\alpha)]s_{t-m}\)
- We set \(\gamma=\gamma^*(1-\alpha)\).
- The usual parameter restriction is \(0\le\gamma^*\le1\), which translates to \(0\le\gamma\le(1-\alpha)\).
Exponential smoothing: seasonality
ETS(A,A,A)
Holt-Winters additive method with additive errors.
- Forecast errors: \(\varepsilon_{t} = y_t - \hat{y}_{t|t-1}\)
- \(k\) is integer part of \((h-1)/m\).
Holt-Winters multiplicative method
Seasonal variations change in proportion to the level of the series.
- \(k\) is integer part of \((h-1)/m\).
- Additive method: \(s_t\) in absolute terms — within each year \(\sum_i s_i \approx 0\).
- Multiplicative method: \(s_t\) in relative terms — within each year \(\sum_i s_i \approx m\).
ETS(M,A,M)
Holt-Winters multiplicative method with multiplicative errors.
- Forecast errors: \(\varepsilon_{t} = (y_t - \hat{y}_{t|t-1})/\hat{y}_{t|t-1}\)
- \(k\) is integer part of \((h-1)/m\).
Example: Australian holiday tourism
aus_holidays <- tourism %>%
filter(Purpose == "Holiday") %>%
summarise(Trips = sum(Trips))
fit <- aus_holidays %>%
model(
additive = ETS(Trips ~ error("A") + trend("A") + season("A")),
multiplicative = ETS(Trips ~ error("M") + trend("A") + season("M"))
)
fc <- fit %>% forecast()
Example: Australian holiday tourism
fc %>%
autoplot(aus_holidays, level = NULL) +
labs(y = "Thousands", title = "Overnight trips")

Estimated components

Holt-Winters damped method
Often the single most accurate forecasting method for seasonal data:
Holt-Winters with daily data
sth_cross_ped <- pedestrian %>%
filter(
Date >= "2016-07-01",
Sensor == "Southern Cross Station"
) %>%
index_by(Date) %>%
summarise(Count = sum(Count) / 1000)
sth_cross_ped %>%
filter(Date <= "2016-07-31") %>%
model(
hw = ETS(Count ~ error("M") + trend("Ad") + season("M"))
) %>%
forecast(h = "2 weeks") %>%
autoplot(sth_cross_ped %>% filter(Date <= "2016-08-14")) +
labs(
title = "Daily traffic: Southern Cross",
y = "Pedestrians ('000)"
)
Holt-Winters with daily data

Innovations state space models
Exponential smoothing methods
ETS models
Additive error models
Multiplicative error models
Estimating ETS models
- Smoothing parameters \(\alpha\), \(\beta\), \(\gamma\) and \(\phi\), and the initial states \(\ell_0\), \(b_0\), \(s_0,s_{-1},\dots,s_{-m+1}\) are estimated by maximising the “likelihood” = the probability of the data arising from the specified model.
- For models with additive errors equivalent to minimising SSE.
- For models with multiplicative errors, equivalent to minimising SSE.
Innovations state space models
Let
\(\bm{x}_t = (\ell_t, b_t, s_t, s_{t-1}, \dots, s_{t-m+1})\) and
\(\varepsilon_t\stackrel{\mbox{\scriptsize iid}}{\sim} \mbox{N}(0,\sigma^2)\).
- Additive errors
- \(k(x)=1\).\(y_t = \mu_{t} + \varepsilon_t\).
- Multiplicative errors
- \(k(\bm{x}_{t-1}) = \mu_{t}\).\(y_t = \mu_{t}(1 + \varepsilon_t)\). \(\varepsilon_t = (y_t - \mu_t)/\mu_t\) is relative error.
Innovations state space models
- Estimate parameters \(\bm\theta = (\alpha,\beta,\gamma,\phi)\) and initial states \(\bm{x}_0 = (\ell_0,b_0,s_0,s_{-1},\dots,s_{-m+1})\) by minimizing \(L^*\).
Parameter restrictions
Usual region
- Traditional restrictions in the methods \(0< \alpha,\beta^*,\gamma^*,\phi<1\)(equations interpreted as weighted averages).
- In models we set \(\beta=\alpha\beta^*\) and \(\gamma=(1-\alpha)\gamma^*\).
- Therefore \(0< \alpha <1\), \(0 < \beta < \alpha\) and \(0< \gamma < 1-\alpha\).
- \(0.8<\phi<0.98\) — to prevent numerical difficulties.
Admissible region
- To prevent observations in the distant past having a continuing effect on current forecasts.
- Usually (but not always) less restrictive than region.
- For example for ETS(A,N,N): \(0< \alpha <1\) while \(0< \alpha <2\).
Model selection
where \(L\) is the likelihood and \(k\) is the number of parameters initial states estimated in the model.
which is the AIC corrected (for small sample bias).
AIC and cross-validation
Automatic forecasting
From Hyndman et al. (IJF, 2002):
- Apply each model that is appropriate to the data. Optimize parameters and initial values using MLE (or some other criterion).
- Select best method using AICc:
- Produce forecasts using best method.
- Obtain forecast intervals using underlying state space model.
Method performed very well in M3 competition.
Example: National populations
fit <- global_economy %>%
mutate(Pop = Population / 1e6) %>%
model(ets = ETS(Pop))
fit
Example: National populations
Example: Australian holiday tourism
holidays <- tourism %>%
filter(Purpose == "Holiday")
fit <- holidays %>% model(ets = ETS(Trips))
fit
Example: Australian holiday tourism
fit %>%
filter(Region == "Snowy Mountains") %>%
report()
## Series: Trips
## Model: ETS(M,N,A)
## Smoothing parameters:
## alpha = 0.157
## gamma = 1e-04
##
## Initial states:
## l[0] s[0] s[-1] s[-2] s[-3]
## 142 -61 131 -42.2 -27.7
##
## sigma^2: 0.0388
##
## AIC AICc BIC
## 852 854 869
Example: Australian holiday tourism
fit %>%
filter(Region == "Snowy Mountains") %>%
components(fit)
Example: Australian holiday tourism
fit %>%
filter(Region == "Snowy Mountains") %>%
components(fit) %>%
autoplot()

Example: Australian holiday tourism
Example: Australian holiday tourism
fit %>% forecast() %>%
filter(Region == "Snowy Mountains") %>%
autoplot(holidays) +
labs(y = "Thousands", title = "Overnight trips")

Residuals
Response residuals
\[\hat{e}_t = y_t - \hat{y}_{t|t-1}\]
Innovation residuals
Additive error model: \[\hat\varepsilon_t = y_t - \hat{y}_{t|t-1}\]
Multiplicative error model: \[\hat\varepsilon_t = \frac{y_t - \hat{y}_{t|t-1}}{\hat{y}_{t|t-1}}\]
Example: Australian holiday tourism
aus_holidays <- tourism %>%
filter(Purpose == "Holiday") %>%
summarise(Trips = sum(Trips))
fit <- aus_holidays %>%
model(ets = ETS(Trips)) %>%
report()
## Series: Trips
## Model: ETS(M,N,M)
## Smoothing parameters:
## alpha = 0.358
## gamma = 0.000969
##
## Initial states:
## l[0] s[0] s[-1] s[-2] s[-3]
## 9667 0.943 0.927 0.968 1.16
##
## sigma^2: 0.0022
##
## AIC AICc BIC
## 1331 1333 1348
Example: Australian holiday tourism
residuals(fit)
residuals(fit, type = "response")

Example: Australian holiday tourism
Some unstable models
- Some of the combinations of (Error, Trend, Seasonal) can lead to numerical difficulties; see equations with division by a state.
- These are: ETS(A,N,M), ETS(A,A,M), ETS(A,A,M).
- Models with multiplicative errors are useful for strictly positive data, but are not numerically stable with data containing zeros or negative values. In that case only the six fully additive models will be applied.
Exponential smoothing models
Forecasting with exponential smoothing
Forecasting with ETS models
iterate the equations for \(t=T+1,T+2,\dots,T+h\) and set all \(\varepsilon_t=0\) for \(t>T\).
- Not the same as \(\text{E}(y_{t+h} | \bm{x}_t)\) unless seasonality is additive.
fable uses \(\text{E}(y_{t+h} | \bm{x}_t)\).
- Point forecasts for ETS(A,*,*) are identical to ETS(M,*,*) if the parameters are the same.
Example: ETS(A,A,N)
\[\begin{align*}
y_{T+1} &= \ell_T + b_T + \varepsilon_{T+1}\\
\hat{y}_{T+1|T} & = \ell_{T}+b_{T}\\
y_{T+2} & = \ell_{T+1} + b_{T+1} + \varepsilon_{T+2}\\
& =
(\ell_T + b_T + \alpha\varepsilon_{T+1}) +
(b_T + \beta \varepsilon_{T+1}) +
\varepsilon_{T+2} \\
\hat{y}_{T+2|T} &= \ell_{T}+2b_{T}
\end{align*}\] etc.
Example: ETS(M,A,N)
\[\begin{align*}
y_{T+1} &= (\ell_T + b_T )(1+ \varepsilon_{T+1})\\
\hat{y}_{T+1|T} & = \ell_{T}+b_{T}.\\
y_{T+2} & = (\ell_{T+1} + b_{T+1})(1 + \varepsilon_{T+2})\\
& = \left\{
(\ell_T + b_T) (1+ \alpha\varepsilon_{T+1}) +
\left[b_T + \beta (\ell_T + b_T)\varepsilon_{T+1}\right]
\right\}
(1 + \varepsilon_{T+2}) \\
\hat{y}_{T+2|T} &= \ell_{T}+2b_{T}
\end{align*}\] etc.
Forecasting with ETS models
can only be generated using the models.
- The prediction intervals will differ between models with additive and multiplicative errors.
- Exact formulae for some models.
- More general to simulate future sample paths, conditional on the last estimate of the states, and to obtain prediction intervals from the percentiles of these simulated future paths.
Prediction intervals
Example: Corticosteroid drug sales
h02 <- PBS %>%
filter(ATC2 == "H02") %>%
summarise(Cost = sum(Cost))
h02 %>% autoplot(Cost)

Example: Corticosteroid drug sales
h02 %>%
model(ETS(Cost)) %>%
report()
## Series: Cost
## Model: ETS(M,Ad,M)
## Smoothing parameters:
## alpha = 0.307
## beta = 0.000101
## gamma = 0.000101
## phi = 0.978
##
## Initial states:
## l[0] b[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5] s[-6] s[-7] s[-8] s[-9]
## 417269 8206 0.872 0.826 0.756 0.773 0.687 1.28 1.32 1.18 1.16 1.1
## s[-10] s[-11]
## 1.05 0.981
##
## sigma^2: 0.0046
##
## AIC AICc BIC
## 5515 5519 5575
Example: Corticosteroid drug sales
h02 %>%
model(ETS(Cost ~ error("A") + trend("A") + season("A"))) %>%
report()
## Series: Cost
## Model: ETS(A,A,A)
## Smoothing parameters:
## alpha = 0.17
## beta = 0.00631
## gamma = 0.455
##
## Initial states:
## l[0] b[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5] s[-6] s[-7]
## 409706 9097 -99075 -136602 -191496 -174531 -241437 210644 244644 145368
## s[-8] s[-9] s[-10] s[-11]
## 130570 84458 39132 -11674
##
## sigma^2: 3.5e+09
##
## AIC AICc BIC
## 5585 5589 5642
Example: Corticosteroid drug sales
h02 %>%
model(ETS(Cost)) %>%
forecast() %>%
autoplot(h02)

Example: Corticosteroid drug sales
h02 %>%
model(
auto = ETS(Cost),
AAA = ETS(Cost ~ error("A") + trend("A") + season("A"))
) %>%
accuracy()
| auto |
38649 |
51102 |
4.99 |
0.638 |
0.689 |
| AAA |
43378 |
56784 |
6.05 |
0.716 |
0.766 |